feat: LangChain-enhanced task completion detection for keepalive by stranske · Pull Request #459 · stranske/Workflows

stranske · 2026-01-02T20:56:51Z

Source: Issue #129

Automated Status Summary

Scope

After merging PR chore(codex): bootstrap PR for issue #101 #103 (multi-agent routing infrastructure), we need to:
1. Validate the CLI agent pipeline works end-to-end with the new task-focused prompts
2. Add GITHUB_STEP_SUMMARY output so iteration results are visible in the Actions UI
3. Streamline the Automated Status Summary to reduce clutter when using CLI agents
4. Clean up comment patterns to avoid a mix of old UI-agent and new CLI-agent comments

Tasks

Acceptance criteria

Head SHA: ac4aa0e
Latest Runs: ✅ success — Gate
Required: gate: ✅ success

Workflow / Job	Result	Logs
Agents PR meta manager	❔ in progress	View run
CI Autofix Loop	✅ success	View run
Gate	✅ success	View run
Health 40 Sweep	✅ success	View run
Health 44 Gate Branch Protection	✅ success	View run
Health 45 Agents Guard	✅ success	View run
Health 50 Security Scan	✅ success	View run
Maint 52 Validate Workflows	❌ failure	View run
PR 11 - Minimal invariant CI	✅ success	View run
Selftest CI	✅ success	View run
Validate Sync Manifest	✅ success	View run

- Add llm_provider.py with GitHub Models → OpenAI → regex fallback chain - Add codex_jsonl_parser.py for parsing Codex --json event streams - Add codex_session_analyzer.py for task completion detection - Add langchain optional dependency to pyproject.toml - Add comprehensive tests for all new modules (38 tests) - Add integration plan document with data source options This implements Phase 0 of the LangChain keepalive integration: - Option A: Summary only (current --output-last-message) - Option B: Full JSONL stream (--json mode) - recommended - Option B filtered: High-value events only (agent_message, reasoning, todo_list) The JSONL parser handles Codex event schema variations including: - Old (assistant_message) and new (agent_message) field names - Streaming item updates - Todo list items for direct task mapping Refs #453

- Add SESSION_JSONL variable for PR-specific session file naming - Change Codex execution to use --json flag, redirecting JSONL stream to file - Add 'Analyze Codex session' step that parses session data with codex_jsonl_parser - Output session metrics (events, messages, commands, file changes, todos) - Include codex-session*.jsonl in artifact uploads Part of #454: LangChain-enhanced task completion detection

- Add scripts/analyze_codex_session.py CLI for session analysis - Extract tasks from PR body checkboxes - Run LLM analysis via GitHub Models API (with OpenAI/regex fallback) - Support JSON, markdown, and github-actions output formats - Update PR body checkboxes based on completion detection - Enhance workflow with dedicated LLM analysis step - New 'Analyze task completion with LLM' step after session parsing - Fetches PR body via gh CLI to extract tasks - Outputs completion results to GITHUB_OUTPUT - Add 17 tests for CLI (100% pass) - Task extraction from PR body - Checkbox update logic - CLI integration tests Part of #454: LangChain-enhanced task completion detection

…ation - Add llmCompletedTasks parameter to autoReconcileTasks() - LLM tasks take priority over commit-based analysis - Commit analysis adds supplementary matches not covered by LLM - Deduplicates matches by task text (case-insensitive) - Add LLM analysis outputs to reusable-codex-run.yml - llm-analysis-run: whether analysis was performed - llm-completed-tasks: JSON array of completed tasks - llm-has-completions: boolean for quick check - session-event-count, session-todo-count for metrics - Save analysis JSON file for debugging (codex-analysis-{PR}.json) - Uploaded as artifact alongside session JSONL - Update keepalive workflows to pass LLM tasks - agents-keepalive-loop.yml - templates/consumer-repo/.github/workflows/agents-keepalive-loop.yml All 63 JS tests pass, all 55 Python tests pass. Part of #454: LangChain-enhanced task completion detection

- Add llm_provider, llm_confidence, llm_analysis_run inputs to updateKeepaliveLoopSummary - Display 🧠 Task Analysis section showing which provider was used - Show warning when fallback provider (OpenAI or regex) was used - Add llm-provider and llm-confidence outputs to reusable-codex-run.yml - Update agents-keepalive-loop.yml to pass LLM info to summary - Update consumer template with same changes - Add 3 tests for LLM provider display scenarios This gives users visibility into whether the primary GitHub Models provider was used or if the system fell back to OpenAI or regex.

chatgpt-codex-connector · 2026-01-02T20:57:02Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

github-actions · 2026-01-02T20:57:55Z

github-actions · 2026-01-02T20:58:27Z

Automated Status Summary

Head SHA: 7de54ca
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / Enforce agents workflow protections
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job	Result	Logs
(no jobs reported)	⏳ pending	—

Coverage Overview

Coverage history entries: 1

Coverage Trend

Metric	Value
Current	92.21%
Baseline	85.00%
Delta	+7.21%
Minimum	70.00%
Status	✅ Pass

Top Coverage Hotspots (lowest coverage)

File	Coverage	Missing
`scripts/workflow_health_check.py`	62.6%	28
`scripts/classify_test_failures.py`	62.9%	37
`scripts/ledger_validate.py`	65.3%	63
`scripts/mypy_return_autofix.py`	82.6%	11
`scripts/ledger_migrate_base.py`	85.5%	13
`scripts/fix_cosmetic_aggregate.py`	92.3%	1
`scripts/coverage_history_append.py`	92.8%	2
`scripts/workflow_validator.py`	93.3%	4
`scripts/update_autofix_expectations.py`	93.9%	1
`scripts/pr_metrics_tracker.py`	95.7%	3
`scripts/generate_residual_trend.py`	96.6%	1
`scripts/build_autofix_pr_comment.py`	97.0%	2
`scripts/aggregate_agent_metrics.py`	97.2%	0
`scripts/fix_numpy_asserts.py`	98.1%	0
`scripts/sync_test_dependencies.py`	98.3%	1

Updated automatically; will refresh on subsequent CI/Docker completions.

Keepalive checklist

Scope

After merging PR chore(codex): bootstrap PR for issue #101 #103 (multi-agent routing infrastructure), we need to:
1. Validate the CLI agent pipeline works end-to-end with the new task-focused prompts
2. Add GITHUB_STEP_SUMMARY output so iteration results are visible in the Actions UI
3. Streamline the Automated Status Summary to reduce clutter when using CLI agents
4. Clean up comment patterns to avoid a mix of old UI-agent and new CLI-agent comments

Tasks

Acceptance criteria

CLI agent receives explicit tasks in prompt and works on them
Iteration results visible in Actions workflow run summary
PR body shows checkboxes but not workflow clutter when using CLI agents
UI Codex path (no agent label) continues to show full status summary
CLI agent PRs have ≤3 bot comments total (summary, one per iteration update) instead of 10+
State tracking is consolidated in the summary comment, not scattered
## Dependencies
- Requires PR chore(codex): bootstrap PR for issue #101 #103 to be merged first
[ ]

github-actions · 2026-01-02T20:58:53Z

🤖 Keepalive Loop Status

PR #459 | Agent: Codex | Iteration 0/5

Current State

Metric	Value
Iteration progress	[----------] 0/5
Action	wait (missing-agent-label)
Disposition	skipped (transient)
Gate	success
Tasks	0/55 complete
Keepalive	❌ disabled
Autofix	❌ disabled

🔍 Failure Classification

Copilot

Pull request overview

This PR adds LangChain-based LLM analysis for intelligent task completion detection in the keepalive automation loop. The implementation introduces a provider fallback chain (GitHub Models → OpenAI → Regex), JSONL session parsing from Codex --json output, and integrates task analysis results into PR updates and summary comments.

Key changes:

New Python modules for LLM provider abstraction, JSONL parsing, and session analysis
Workflow modifications to capture --json output and run LLM analysis
JavaScript updates to display LLM provider information and merge LLM-detected tasks with commit-based detection
20 new tests covering Python CLI, analysis, and JavaScript display logic

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
`tools/llm_provider.py`	LLM provider abstraction with GitHub Models/OpenAI/Regex fallback chain
`tools/codex_jsonl_parser.py`	Parser for Codex JSONL event stream from `--json` output
`tools/codex_session_analyzer.py`	Orchestrates session analysis using parsed JSONL and LLM providers
`scripts/analyze_codex_session.py`	CLI entry point for analyzing sessions from GitHub Actions
`tests/tools/test_llm_provider.py`	Unit tests for provider availability and fallback behavior
`tests/tools/test_codex_jsonl_parser.py`	Tests for JSONL parsing including schema variations
`tests/scripts/test_analyze_codex_session.py`	CLI integration tests with subprocess calls
`.github/workflows/reusable-codex-run.yml`	Captures `--json` output and runs analysis steps
`templates/consumer-repo/.github/workflows/agents-keepalive-loop.yml`	Template workflow integrating LLM task detection
`.github/workflows/agents-keepalive-loop.yml`	Passes LLM metadata to summary comment generation
`.github/scripts/keepalive_loop.js`	Displays LLM provider info and merges LLM/commit task sources
`.github/scripts/__tests__/keepalive-loop.test.js`	Tests for LLM provider display in PR summaries
`pyproject.toml`	Adds optional langchain dependencies
`docs/plans/langchain-keepalive-integration.md`	Planning document describing architecture and options

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-02T21:06:07Z

.github/scripts/keepalive_loop.js

+    summaryLines.push(
+      '',
+      '### 🧠 Task Analysis',
+      `| Provider | ${providerIcon} ${providerLabel} |`,
+      `| Confidence | ${confidencePercent}% |`,


The markdown table formatting is incomplete. Lines 1231-1232 create table rows without proper markdown table syntax (missing header separator and consistent column structure). The output will render as plain text rather than a table. Add proper table headers and separators, for example:

| Field | Value | |-------|-------| | Provider | ... | | Confidence | ... |

Copilot · 2026-01-02T21:06:08Z

tools/llm_provider.py

+
+# GitHub Models API endpoint (OpenAI-compatible)
+GITHUB_MODELS_BASE_URL = "https://models.inference.ai.azure.com"
+DEFAULT_MODEL = "gpt-4o-mini"


The PR description states the primary provider uses "gpt-4.1-mini", but the code actually uses "gpt-4o-mini" (line 28). This is a discrepancy between documentation and implementation. "gpt-4.1-mini" doesn't appear to be a valid OpenAI model name. Update the PR description to reflect the actual model being used.

Copilot · 2026-01-02T21:06:08Z

.github/workflows/reusable-codex-run.yml

+          python3 << 'PYEOF'
+          import os
+          import sys
+          sys.path.insert(0, '.')


The inline Python script sets sys.path.insert(0, '.') (line 465) but PYTHONPATH is already set to github.workspace (line 438). The relative path '.' may not resolve correctly depending on the working directory at execution time. For consistency and reliability, use the PYTHONPATH that's already configured or use an absolute path based on github.workspace.

Suggested change

sys.path.insert(0, '.')

Copilot · 2026-01-02T21:06:08Z

tests/scripts/test_analyze_codex_session.py

+        with patch("tools.llm_provider.get_llm_provider") as mock_provider:
+            from tools.llm_provider import RegexFallbackProvider
+
+            mock_provider.return_value = RegexFallbackProvider()
+
+            result = subprocess.run(
+                [
+                    sys.executable,
+                    "scripts/analyze_codex_session.py",
+                    "--session-file",
+                    str(sample_session_file),
+                    "--pr-body-file",
+                    str(sample_pr_body_file),
+                    "--output",
+                    "json",
+                    "--update-pr-body",
+                    "--updated-body-file",
+                    str(updated_file),
+                ],
+                capture_output=True,
+                text=True,
+                cwd=Path(__file__).parent.parent.parent,
+            )
+
+            assert result.returncode == 0


The mock provider is set up using a context manager (lines 256-259), but the subprocess.run call (lines 261-278) spawns a separate Python process that won't inherit this mock. The patch only affects the current test process, not the subprocess. This test will actually use the real provider chain, not the mocked RegexFallbackProvider. To properly test this, either mock at the subprocess level (via environment manipulation) or refactor the CLI to be testable without subprocess calls.

Copilot · 2026-01-02T21:06:08Z

tools/llm_provider.py

+        # Reuse the same prompt building logic
+        github_provider = GitHubModelsProvider()
+        prompt = github_provider._build_analysis_prompt(session_output, tasks, context)
+
+        try:
+            response = client.invoke(prompt)
+            result = github_provider._parse_response(response.content, tasks)


The OpenAI provider instantiates a GitHubModelsProvider just to reuse its private methods. This creates an unnecessary object and tightly couples the two providers. Consider extracting the prompt building and response parsing into shared helper functions or methods on a base class.

Copilot · 2026-01-02T21:06:09Z

scripts/analyze_codex_session.py

+    for task in completed_tasks:
+        # Escape special regex characters in task
+        escaped_task = re.escape(task)
+
+        # Pattern to match unchecked checkbox with this task
+        pattern = re.compile(
+            rf"^([\s]*-\s*)\[ \](\s*){escaped_task}",
+            re.MULTILINE,
+        )
+
+        # Replace with checked version
+        updated_body = pattern.sub(rf"\1[x]\2{task}", updated_body)


The regex pattern on line 109 requires an exact match of the task text after escaping, but the escaped task is inserted unescaped back into the replacement on line 114. If the task contains characters that were escaped (e.g., parentheses, brackets), the replacement will use the original unescaped version, which could cause the pattern to not match correctly on subsequent calls or create inconsistencies. Use escaped_task in the replacement as well, or use the original task variable in both places consistently.

Copilot · 2026-01-02T21:06:10Z

tools/llm_provider.py

+        for task in tasks:
+            task_lower = task.lower()
+            # Simple keyword matching
+            task_words = set(task_lower.split())
+
+            # Check for completion signals
+            is_completed = any(
+                word in output_lower
+                and any(
+                    p in output_lower
+                    for p in ["completed", "finished", "done", "fixed", "✓", "[x]"]
+                )
+                for word in task_words
+                if len(word) > 3
+            )
+
+            # Check for progress signals
+            is_in_progress = any(
+                word in output_lower
+                and any(
+                    p in output_lower
+                    for p in ["working on", "started", "implementing", "in progress"]
+                )
+                for word in task_words
+                if len(word) > 3
+            )
+
+            # Check for blocker signals
+            is_blocked = any(
+                word in output_lower
+                and any(
+                    p in output_lower for p in ["blocked", "stuck", "failed", "error", "cannot"]
+                )
+                for word in task_words
+                if len(word) > 3
+            )
+
+            if is_completed:
+                completed.append(task)
+            elif is_blocked:
+                blocked.append(task)
+            elif is_in_progress:
+                in_progress.append(task)


The regex fallback matching logic has a high likelihood of false positives. The current logic checks if any task word (longer than 3 characters) appears anywhere in the output along with a completion keyword anywhere else in the output. For example, if the output contains "completed refactoring" and a task is "Update tests", both "update" and "tests" are unrelated to "completed refactoring", but if either word appears anywhere in the output, the task would be marked as completed. Consider requiring proximity between the task words and status keywords, or using the defined but unused COMPLETION_PATTERNS, PROGRESS_PATTERNS, and BLOCKER_PATTERNS regex patterns.

Copilot · 2026-01-02T21:06:10Z

tools/llm_provider.py

+    # Patterns indicating task completion
+    COMPLETION_PATTERNS = [
+        r"(?:completed?|finished|done|implemented|fixed|resolved)\s+(?:the\s+)?(.+?)(?:\.|$)",
+        r"✓\s+(.+?)(?:\.|$)",
+        r"\[x\]\s+(.+?)(?:\.|$)",
+        r"successfully\s+(?:completed?|implemented|fixed)\s+(.+?)(?:\.|$)",
+    ]
+
+    # Patterns indicating work in progress
+    PROGRESS_PATTERNS = [
+        r"(?:working on|started|beginning|implementing)\s+(.+?)(?:\.|$)",
+        r"(?:in progress|ongoing):\s*(.+?)(?:\.|$)",
+    ]
+
+    # Patterns indicating blockers
+    BLOCKER_PATTERNS = [
+        r"(?:blocked|stuck|cannot|failed|error)\s+(?:on\s+)?(.+?)(?:\.|$)",
+        r"(?:issue|problem|bug)\s+(?:with\s+)?(.+?)(?:\.|$)",
+    ]


The COMPLETION_PATTERNS, PROGRESS_PATTERNS, and BLOCKER_PATTERNS class variables are defined but never used. The analyze_completion method implements its own simpler keyword matching instead. Either remove these unused patterns or refactor the logic to use them.

Copilot · 2026-01-02T21:06:10Z

tools/codex_jsonl_parser.py

+            items = event.get("items", [])
+            if not items and content:
+                # Try to parse from content
+                import contextlib


The import of contextlib is done inside the method body rather than at the module level. This is unconventional and adds unnecessary overhead on each call. Move this import to the top of the file with other imports.

Copilot · 2026-01-02T21:06:10Z

.github/workflows/reusable-codex-run.yml

+            eval "codex exec --json --skip-git-repo-check --sandbox \"$SANDBOX\" --output-last-message \"$OUTPUT_FILE\" $EXTRA_ARGS \"\$(cat \"\$PROMPT_FILE\")\"" > "$SESSION_JSONL" 2>&1 || CODEX_EXIT=$?
          else
-            codex exec --skip-git-repo-check --sandbox "$SANDBOX" --output-last-message "$OUTPUT_FILE" "$(cat "$PROMPT_FILE")" || CODEX_EXIT=$?
+            codex exec --json --skip-git-repo-check --sandbox "$SANDBOX" --output-last-message "$OUTPUT_FILE" "$(cat "$PROMPT_FILE")" > "$SESSION_JSONL" 2>&1 || CODEX_EXIT=$?


The codex exec command redirects both stdout and stderr to the SESSION_JSONL file (using > "$SESSION_JSONL" 2>&1). This means any stderr output (warnings, errors, debug messages) will be mixed with the JSONL events, which could cause parsing failures. Consider separating stderr or using a more robust approach like tee to capture stdout while still allowing stderr to be visible in logs, or redirect stderr separately.

Root cause: The reusable workflow was calling scripts/analyze_codex_session.py but the scripts were only available in the Workflows repo, not in consumer repos that call the reusable workflow. Changes: - Expanded sparse checkout to include scripts/ and tools/ directories - Made Workflows repo checkout ref dynamic (github.job_workflow_sha) so testing feature branches works correctly - Updated PYTHONPATH to include .workflows-lib for imports - Fixed script paths to use .workflows-lib/ prefix - Added LLM dependency installation step from .workflows-lib/tools/requirements.txt - Added requirements.txt for LLM dependencies (langchain-openai) - Added error output display for debugging when LLM analysis fails

The github.job_workflow_sha doesn't work correctly for checkout@v4 when using sparse-checkout. Instead, extract the ref from github.workflow_ref which contains the full path including the ref (e.g., refs/heads/feature/langchain-analysis).

Temporarily disable sparse-checkout to do a full checkout and ensure the scripts/ and tools/ directories are available. Will re-enable sparse-checkout once the checkout issue is resolved.

Add debugging to understand what context variables are available and use fetch-depth: 0 to ensure the SHA is fetchable when using job_workflow_sha.

@ref

Add a new input 'workflows_ref' that callers can use to specify which ref of the Workflows repo to checkout for scripts. This is needed because github.job_workflow_sha is not available in reusable workflow context. Callers should set workflows_ref to match their @ref in the uses: line. Defaults to 'main'.

stranske added 5 commits January 2, 2026 19:38

Copilot AI review requested due to automatic review settings January 2, 2026 20:56

stranske temporarily deployed to agent-high-privilege January 2, 2026 20:57 — with GitHub Actions Inactive

github-actions bot added the autofix Opt-in automated formatting & lint remediation label Jan 2, 2026

Copilot started reviewing on behalf of stranske January 2, 2026 20:57 View session

Copilot AI reviewed Jan 2, 2026

View reviewed changes

stranske temporarily deployed to agent-high-privilege January 2, 2026 21:57 — with GitHub Actions Inactive

stranske temporarily deployed to agent-high-privilege January 2, 2026 22:05 — with GitHub Actions Inactive

fix: remove sparse-checkout to troubleshoot Workflows checkout issue

3f57b82

Temporarily disable sparse-checkout to do a full checkout and ensure the scripts/ and tools/ directories are available. Will re-enable sparse-checkout once the checkout issue is resolved.

stranske temporarily deployed to agent-high-privilege January 2, 2026 22:10 — with GitHub Actions Inactive

fix: debug workflow context and use job_workflow_sha with fetch-depth 0

5ab12b4

Add debugging to understand what context variables are available and use fetch-depth: 0 to ensure the SHA is fetchable when using job_workflow_sha.

stranske temporarily deployed to agent-high-privilege January 2, 2026 22:16 — with GitHub Actions Inactive

stranske temporarily deployed to agent-high-privilege January 2, 2026 22:30 — with GitHub Actions Inactive

stranske merged commit c240629 into main Jan 2, 2026
498 of 499 checks passed

stranske deleted the feature/langchain-analysis branch January 2, 2026 22:42

github-actions bot mentioned this pull request Jan 2, 2026

[Follow-up] Unmet criteria from PR #459 #460

Closed

23 tasks

stranske mentioned this pull request Jan 2, 2026

fix: add models:read permission for GitHub Models API #463

Merged

76 tasks

Conversation

stranske commented Jan 2, 2026 • edited by agents-workflows-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Status Summary

Scope

Tasks

Acceptance criteria

Uh oh!

chatgpt-codex-connector bot commented Jan 2, 2026

Uh oh!

github-actions bot commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Status Summary

Coverage Overview

Coverage Trend

Top Coverage Hotspots (lowest coverage)

Keepalive checklist

Scope

Tasks

Acceptance criteria

Uh oh!

github-actions bot commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Keepalive Loop Status

Current State

🔍 Failure Classification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stranske commented Jan 2, 2026 •

edited by agents-workflows-bot bot

Loading

github-actions bot commented Jan 2, 2026 •

edited

Loading

github-actions bot commented Jan 2, 2026 •

edited

Loading

github-actions bot commented Jan 2, 2026 •

edited

Loading